---
title: "Priority Watersheds for Wetland Conservation"
format:
gfm: default
html:
embed-resources: true
code-fold: true
code-tools: true
toc: true
toc-depth: 3
---
## Priority Watershed Analysis Methodology
This analysis identifies priority watersheds for wetland conservation based on a composite score that integrates multiple conservation criteria. We use HydroBASINS Level 6 watersheds as our spatial units of analysis.
### Data Sources
The analysis integrates the following global datasets:
1. **Wetlands**: Global Lakes and Wetlands Database (GLWD) - provides wetland extent and type
2. **Carbon Storage**: Vulnerable carbon data from Conservation International - represents irrecoverable carbon stocks
3. **Protected Areas**: World Database on Protected Areas (WDPA) - identifies existing conservation coverage
4. **Biodiversity Value**: Nature's Contributions to People (NCP) scores - biodiversity and ecosystem service importance
5. **Watersheds**: HydroBASINS Level 6 - watershed boundaries for analysis units
### Composite Score Calculation
Each watershed receives a **composite score** (0-1 scale) based on four equally-weighted criteria. **Scores are normalized within each country** to identify relative conservation priorities within that country, not for cross-country comparison.
#### 1. Wetland Area (25% weight)
- **Metric**: Total hectares of wetlands within the watershed
- **Normalization**: Scaled relative to the largest wetland area in the country
- **Rationale**: Larger wetland areas provide greater ecosystem services and habitat
#### 2. Vulnerable Carbon (25% weight)
- **Metric**: Total vulnerable carbon stocks (metric tons) in wetland areas
- **Normalization**: Scaled relative to the highest carbon stock in the country
- **Rationale**: Identifies wetlands whose loss would release significant greenhouse gases
#### 3. Protection Gap (25% weight)
- **Metric**: Fraction of wetland area NOT currently protected (1 - protected_fraction)
- **Range**: 0 (fully protected) to 1 (completely unprotected)
- **Rationale**: Prioritizes watersheds with low current protection status
#### 4. Biodiversity & Ecosystem Services (25% weight)
- **Metric**: Average Nature's Contributions to People (NCP) score
- **Range**: 0 (low importance) to 1 (high importance)
- **Rationale**: Identifies areas critical for biodiversity and human well-being
### Formula
```
composite_score = (
(wetland_area / max_wetland_area) * 0.25 +
(total_carbon / max_total_carbon) * 0.25 +
(1 - protected_fraction) * 0.25 +
avg_ncp_score * 0.25
)
```
### Important Notes
- **GLWD Multiple Categories**: A single hexagon in GLWD can have multiple wetland type codes. We use `n_distinct(h8)` to avoid counting the same location multiple times.
- **H3 Hexagons**: All data is indexed using H3 hexagons at resolution 8, where each hex = 73.73 hectares
- **Country-Specific Normalization**: Scores are normalized within each country to identify relative priorities within that country
- **Regional Context**: Each basin is associated with its administrative region for spatial orientation
- **Output Files**: Analysis produces both summary files (top 10 basins) and complete files (all basins) for each country
## Setup
```{r setup, message=FALSE, warning=FALSE}
library(duckdbfs)
library(dplyr)
library(dbplyr)
library(ggplot2)
library(tidyr)
library(patchwork)
# Configure DuckDB connection to S3
duckdb_secrets(
key = "",
secret = "",
endpoint = "minio.carlboettiger.info"
)
# H3 hexagon area constant
hex_hectares <- 73.7327598
# Load datasets
countries <- open_dataset(
"s3://public-overturemaps/hex/countries.parquet",
recursive = FALSE
)
regions <- open_dataset("s3://public-overturemaps/hex/regions/**")
wetlands <- open_dataset("s3://public-wetlands/glwd/hex/**")
hydrobasins <- open_dataset("s3://public-hydrobasins/level_06/hexes/**")
carbon <- open_dataset("s3://public-carbon/hex/vulnerable-carbon/**")
wdpa <- open_dataset("s3://public-wdpa/hex/**")
ncp <- open_dataset("s3://public-ncp/hex/ncp_biod_nathab/**")
```
## Analysis Functions
```{r}
# Calculate basin metrics for a country
calculate_basin_metrics <- function (country_code) {
basin_wetlands <- countries |>
filter (country == country_code) |>
inner_join (
wetlands,
by = c ("h8" , "h0" ),
suffix = c ("_country" , "_wetland" )
) |>
filter (Z > 0 ) |>
inner_join (hydrobasins, by = c ("h8" , "h0" ), suffix = c ("" , "_basin" )) |>
left_join (
regions |> select (h8, h0, region_name = name),
by = c ("h8" , "h0" )
) |>
select (
basin_id = id_basin,
PFAF_ID,
UP_AREA,
SUB_AREA,
h8,
h0,
region_name
)
basin_metrics <- basin_wetlands |>
left_join (carbon, by = c ("h8" , "h0" )) |>
left_join (
wdpa |> select (h8, h0, wdpa_present = OBJECTID),
by = c ("h8" , "h0" )
) |>
left_join (ncp, by = c ("h8" , "h0" )) |>
group_by (basin_id, PFAF_ID, UP_AREA, SUB_AREA, region_name) |>
summarise (
wetland_hex_count = n_distinct (h8),
total_carbon = round (coalesce (sum (carbon, na.rm = TRUE ), 0 ), 2 ),
protected_fraction = round (
n_distinct (h8[! is.na (wdpa_present)]) / n_distinct (h8),
3
),
avg_ncp_score = round (mean (ncp, na.rm = TRUE ), 3 ),
.groups = "drop"
) |>
mutate (
wetland_area_hectares = round (wetland_hex_count * hex_hectares, 2 )
)
all_basins <- basin_metrics |>
mutate (
norm_wetland_area = wetland_area_hectares /
max (wetland_area_hectares, na.rm = TRUE ),
norm_carbon = total_carbon / max (total_carbon, na.rm = TRUE ),
protection_gap = 1 - protected_fraction,
composite_score = round (
(norm_wetland_area *
0.25 +
norm_carbon * 0.25 +
protection_gap * 0.25 +
coalesce (avg_ncp_score, 0 ) * 0.25 ),
3
)
) |>
select (
basin_id,
PFAF_ID,
region_name,
upstream_area_km2 = UP_AREA,
basin_area_km2 = SUB_AREA,
wetland_hex_count,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
norm_wetland_area,
norm_carbon,
protection_gap,
composite_score
) |>
filter (wetland_hex_count > 0 ) |>
arrange (desc (composite_score)) |>
collect ()
return (all_basins)
}
# Save basin results to CSV
save_basin_results <- function (basin_data, country_code, country_name) {
filename <- paste0 (
'priority_hydrobasins_all_' ,
tolower (country_code),
'.csv'
)
basin_data |>
mutate (
country = country_name,
country_code = country_code
) |>
write.csv (filename, row.names = FALSE )
invisible (filename)
}
# Create country visualization
plot_country_analysis <- function (basin_data, country_name, top_n = 10 ) {
top_basins <- basin_data |> head (top_n)
plot_data <- top_basins |>
mutate (
rank = row_number (),
basin_label = paste0 ("#" , rank, ": " , region_name)
) |>
arrange (rank)
# A. Composite Score by Basin
p1 <- plot_data |>
ggplot (aes (x = reorder (basin_label, - rank), y = composite_score)) +
geom_col (fill = "#66c2a5" ) +
coord_flip () +
labs (
title = paste ("A. Top" , top_n, "Priority Basins" ),
x = NULL ,
y = "Composite Score"
) +
theme_minimal () +
theme (panel.grid.major.y = element_blank ())
# B. Component Scores Breakdown
p2 <- plot_data |>
select (
basin_label,
rank,
` Wetland Area ` = norm_wetland_area,
` Carbon Storage ` = norm_carbon,
` Protection Gap ` = protection_gap,
` NCP Score ` = avg_ncp_score
) |>
pivot_longer (
cols = c (
` Wetland Area ` ,
` Carbon Storage ` ,
` Protection Gap ` ,
` NCP Score `
),
names_to = "metric" ,
values_to = "score"
) |>
ggplot (aes (x = reorder (basin_label, - rank), y = score, fill = metric)) +
geom_col (position = "dodge" ) +
coord_flip () +
scale_fill_manual (
values = c (
"Wetland Area" = "#8dd3c7" ,
"Carbon Storage" = "#bebada" ,
"Protection Gap" = "#fb8072" ,
"NCP Score" = "#80b1d3"
)
) +
labs (
title = "B. Conservation Criteria Breakdown" ,
x = NULL ,
y = "Normalized Score" ,
fill = "Metric"
) +
theme_minimal () +
theme (panel.grid.major.y = element_blank ())
# C. Distribution of all basins with top 10 highlighted
p3 <- basin_data |>
mutate (is_top10 = basin_id %in% top_basins$ basin_id) |>
ggplot (aes (x = composite_score, fill = is_top10)) +
geom_histogram (bins = 30 , alpha = 0.8 ) +
scale_fill_manual (
values = c ("TRUE" = "#fc8d62" , "FALSE" = "#e5e5e5" ),
labels = c ("TRUE" = "Top 10" , "FALSE" = "Other basins" )
) +
labs (
title = "C. Distribution of All Basin Scores" ,
x = "Composite Score" ,
y = "Number of Basins" ,
fill = NULL
) +
theme_minimal ()
# D. Wetland Area vs Protection Status
p4 <- plot_data |>
ggplot (aes (
x = wetland_area_hectares,
y = protected_fraction,
size = total_carbon,
color = avg_ncp_score
)) +
geom_point (alpha = 0.7 ) +
scale_size_continuous (range = c (3 , 10 )) +
scale_color_gradient (low = "#fee5d9" , high = "#a50f15" ) +
labs (
title = "D. Wetland Area vs Protection Status" ,
x = "Wetland Area (hectares)" ,
y = "Protected Fraction" ,
size = "Carbon" ,
color = "NCP Score"
) +
theme_minimal ()
# Combine all plots
(p1 | p2) /
(p3 | p4) +
plot_annotation (
title = paste ("Priority Watershed Analysis:" , country_name),
subtitle = paste (
"Top" ,
top_n,
"of" ,
nrow (basin_data),
"total basins with wetlands"
)
)
}
```
## North America: United States, Canada, and Mexico
### United States
```{r}
us_basins <- calculate_basin_metrics ('US' )
save_basin_results (us_basins, 'US' , 'United States' )
```
```{r}
#| label: tbl-us-top10
#| tbl-cap: "Top 10 priority basins in the United States"
us_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-us
#| fig-cap: "Priority watershed analysis for the United States"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (us_basins, "United States" )
```
### Canada
```{r}
canada_basins <- calculate_basin_metrics ('CA' )
save_basin_results (canada_basins, 'CA' , 'Canada' )
```
```{r}
#| label: tbl-canada-top10
#| tbl-cap: "Top 10 priority basins in Canada"
canada_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-canada
#| fig-cap: "Priority watershed analysis for Canada"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (canada_basins, "Canada" )
```
### Mexico
```{r}
mexico_basins <- calculate_basin_metrics ('MX' )
save_basin_results (mexico_basins, 'MX' , 'Mexico' )
```
```{r}
#| label: tbl-mexico-top10
#| tbl-cap: "Top 10 priority basins in Mexico"
mexico_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-mexico
#| fig-cap: "Priority watershed analysis for Mexico"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (mexico_basins, "Mexico" )
```
## Asia: China, South Korea, and Thailand
### China
```{r}
china_basins <- calculate_basin_metrics ('CN' )
save_basin_results (china_basins, 'CN' , 'China' )
```
```{r}
#| label: tbl-china-top10
#| tbl-cap: "Top 10 priority basins in China"
china_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-china
#| fig-cap: "Priority watershed analysis for China"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (china_basins, "China" )
```
### South Korea
```{r}
korea_basins <- calculate_basin_metrics ('KR' )
save_basin_results (korea_basins, 'KR' , 'South Korea' )
```
```{r}
#| label: tbl-korea-top10
#| tbl-cap: "Top 10 priority basins in South Korea"
korea_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-korea
#| fig-cap: "Priority watershed analysis for South Korea"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (korea_basins, "South Korea" )
```
### Thailand
```{r}
thailand_basins <- calculate_basin_metrics ('TH' )
save_basin_results (thailand_basins, 'TH' , 'Thailand' )
```
```{r}
#| label: tbl-thailand-top10
#| tbl-cap: "Top 10 priority basins in Thailand"
thailand_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-thailand
#| fig-cap: "Priority watershed analysis for Thailand"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (thailand_basins, "Thailand" )
```
## Europe: United Kingdom, France, and Spain
### United Kingdom
```{r}
uk_basins <- calculate_basin_metrics ('GB' )
save_basin_results (uk_basins, 'GB' , 'United Kingdom' )
```
```{r}
#| label: tbl-uk-top10
#| tbl-cap: "Top 10 priority basins in the United Kingdom"
uk_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-uk
#| fig-cap: "Priority watershed analysis for the United Kingdom"
#| #| fig-width: 16
#| fig-height: 12
plot_country_analysis (uk_basins, "United Kingdom" )
```
### France
```{r}
france_basins <- calculate_basin_metrics ('FR' )
save_basin_results (france_basins, 'FR' , 'France' )
```
```{r}
#| label: tbl-france-top10
#| tbl-cap: "Top 10 priority basins in France"
france_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-france
#| fig-cap: "Priority watershed analysis for France"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (france_basins, "France" )
```
### Spain
```{r}
spain_basins <- calculate_basin_metrics ('ES' )
save_basin_results (spain_basins, 'ES' , 'Spain' )
```
```{r}
#| label: tbl-spain-top10
#| tbl-cap: "Top 10 priority basins in Spain"
spain_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-spain
#| fig-cap: "Priority watershed analysis for Spain"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (spain_basins, "Spain" )
```
## South America: Brazil and Chile
### Brazil
```{r}
brazil_basins <- calculate_basin_metrics ('BR' )
save_basin_results (brazil_basins, 'BR' , 'Brazil' )
```
```{r}
#| label: tbl-brazil-top10
#| tbl-cap: "Top 10 priority basins in Brazil"
brazil_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-brazil
#| fig-cap: "Priority watershed analysis for Brazil"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (brazil_basins, "Brazil" )
```
### Chile
```{r}
chile_basins <- calculate_basin_metrics ('CL' )
save_basin_results (chile_basins, 'CL' , 'Chile' )
```
```{r}
#| label: tbl-chile-top10
#| tbl-cap: "Top 10 priority basins in Chile"
chile_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-chile
#| fig-cap: "Priority watershed analysis for Chile"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (chile_basins, "Chile" )
```
## Australia
```{r}
australia_basins <- calculate_basin_metrics ('AU' )
save_basin_results (australia_basins, 'AU' , 'Australia' )
```
```{r}
#| label: tbl-australia-top10
#| tbl-cap: "Top 10 priority basins in Australia"
australia_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-australia
#| fig-cap: "Priority watershed analysis for Australia"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (australia_basins, "Australia" )
```
## India
```{r}
india_basins <- calculate_basin_metrics ('IN' )
save_basin_results (india_basins, 'IN' , 'India' )
```
```{r}
#| label: tbl-india-top10
#| tbl-cap: "Top 10 priority basins in India"
india_basins |>
head (10 ) |>
select (
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
```{r}
#| label: fig-india
#| fig-cap: "Priority watershed analysis for India"
#| fig-width: 16
#| fig-height: 12
plot_country_analysis (india_basins, "India" )
```
## Summary: Combined Results
```{r}
all_results <- bind_rows (
us_basins |>
head (10 ) |>
mutate (country = "United States" , country_code = "US" ),
canada_basins |>
head (10 ) |>
mutate (country = "Canada" , country_code = "CA" ),
mexico_basins |>
head (10 ) |>
mutate (country = "Mexico" , country_code = "MX" ),
china_basins |> head (10 ) |> mutate (country = "China" , country_code = "CN" ),
korea_basins |>
head (10 ) |>
mutate (country = "South Korea" , country_code = "KR" ),
thailand_basins |>
head (10 ) |>
mutate (country = "Thailand" , country_code = "TH" ),
uk_basins |>
head (10 ) |>
mutate (country = "United Kingdom" , country_code = "GB" ),
france_basins |>
head (10 ) |>
mutate (country = "France" , country_code = "FR" ),
spain_basins |> head (10 ) |> mutate (country = "Spain" , country_code = "ES" ),
brazil_basins |>
head (10 ) |>
mutate (country = "Brazil" , country_code = "BR" ),
chile_basins |> head (10 ) |> mutate (country = "Chile" , country_code = "CL" ),
australia_basins |>
head (10 ) |>
mutate (country = "Australia" , country_code = "AU" ),
india_basins |> head (10 ) |> mutate (country = "India" , country_code = "IN" )
)
write.csv (
all_results,
'priority_hydrobasins_top10_summary.csv' ,
row.names = FALSE
)
```
```{r}
#| label: tbl-summary
#| tbl-cap: "Top 10 priority basins per country"
all_results |>
select (
country,
basin_id,
region_name,
wetland_area_hectares,
total_carbon,
protected_fraction,
avg_ncp_score,
composite_score
) |>
knitr:: kable (digits = 2 )
```
## Cross-Country Overview
```{r}
#| label: fig-cross-country
#| fig-cap: "Distribution of composite scores across countries (top 10 basins per country). Note: Scores normalized within each country."
#| fig-width: 12
#| fig-height: 6
all_results |>
ggplot (aes (
x = reorder (country, composite_score, FUN = median),
y = composite_score
)) +
geom_boxplot (fill = "#66c2a5" , alpha = 0.7 ) +
geom_jitter (width = 0.2 , alpha = 0.5 , size = 2 ) +
coord_flip () +
labs (
title = "Distribution of Composite Scores: Top 10 Basins Per Country" ,
subtitle = "Scores normalized within each country - not directly comparable across countries" ,
x = NULL ,
y = "Composite Score (country-specific)"
) +
theme_minimal () +
theme (panel.grid.major.y = element_blank ())
```
## Key Findings
### Methodology
For each country, we identified the top 10 Level 6 HydroBASINS based on a composite score that equally weights four key metrics:
1. **Wetland Area (25%)**: Total hectares of wetlands from GLWD
2. **Carbon Storage (25%)**: Vulnerable carbon in wetlands
3. **Protection Status (25%)**: Fraction of wetlands within WDPA protected areas
4. **Nature's Contributions (25%)**: Average NCP biodiversity score
### Interpretation
The composite score helps identify hydrobasins that balance multiple conservation priorities:
- High wetland area indicates ecological significance
- High carbon storage suggests climate mitigation importance
- Low protection fraction highlights conservation gaps
- High NCP scores indicate biodiversity value and ecosystem services
### Outputs
For each country, the analysis produces:
1. **Complete basin file** (`priority_hydrobasins_all_<country>.csv` ): All basins with wetlands, ranked by composite score
2. **Country visualization** (`priority_hydrobasins_<country>.png` ): Four-panel figure showing:
- Top 10 basins by composite score
- Breakdown of the four conservation criteria
- Distribution of all basin scores
- Relationship between wetland area and protection status
3. **Regional identification**: Each basin labeled with its administrative region for spatial context